Search CORE

19 research outputs found

Identify error-sensitive patterns by decision tree

Author: E Alpaydin
IA Gheyas
IH Witten
J Han
JR Quinlan
L Breiman
L Breiman
L Breiman
LI Kuncheva
M Hall
P Yang
RE Schapire
S Tabakhi
W Wu
Y Saeys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development. This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration

Crossref

OPUS - University of Technology Sydney

Applications of Nature-Inspired Algorithms for Dimension Reduction: Enabling Efficient Data Analytics

Author: A Adeli
A Imteaj
A Zhang
AH Hamamoto
AI Hafez
C Yan
E Hancer
F Harfouchi
F Xie
FG Mohammadi
FG Mohammadi
FG Mohammadi
FG Mohammadi
H Peng
H Rao
H Shi
H Shi
H Wang
J Kaur
J Pierezan
K Ahmed
K Kira
L Ke
M Gong
M Kumari
M Tubishat
MH Amini
MH Amini
MH Amini
MH Amini
MH Amini
MH Amini
MH Amini
MH Amini
MM Kabir
Mohamed Abd Elaziz
MR Mozafar
N Kozodoi
P Moradi
Q Al-Tashi
Q-T Bui
R Hang
R Vanaja
RR Chhikara
S Arora
S Gupta
S Khan
S Roy
S Tabakhi
V Rostami
X-L Li
X-Y Liu
Y Cao
Y Dong
Y Pathak
Y Xue
Y Zhang
Publication venue: FIU Digital Commons
Publication date: 01/08/2019
Field of study

In [1], we have explored the theoretical aspects of feature selection and evolutionary algorithms. In this chapter, we focus on optimization algorithms for enhancing data analytic process, i.e., we propose to explore applications of nature-inspired algorithms in data science. Feature selection optimization is a hybrid approach leveraging feature selection techniques and evolutionary algorithms process to optimize the selected features. Prior works solve this problem iteratively to converge to an optimal feature subset. Feature selection optimization is a non-specific domain approach. Data scientists mainly attempt to find an advanced way to analyze data n with high computational efficiency and low time complexity, leading to efficient data analytics. Thus, by increasing generated/measured/sensed data from various sources, analysis, manipulation and illustration of data grow exponentially. Due to the large scale data sets, Curse of dimensionality (CoD) is one of the NP-hard problems in data science. Hence, several efforts have been focused on leveraging evolutionary algorithms (EAs) to address the complex issues in large scale data analytics problems. Dimension reduction, together with EAs, lends itself to solve CoD and solve complex problems, in terms of time complexity, efficiently. In this chapter, we first provide a brief overview of previous studies that focused on solving CoD using feature extraction optimization process. We then discuss practical examples of research studies are successfully tackled some application domains, such as image processing, sentiment analysis, network traffics / anomalies analysis, credit score analysis and other benchmark functions/data sets analysis

arXiv.org e-Print Archive

Crossref

DigitalCommons@Florida International University

Multi-agent feature selection for integrative multi-omics analysis

Author: Lu H.
Tabakhi S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Multi-omics data integration is key for cancer rediction as it captures different aspects of molecular mechanisms. Nevertheless, the high-dimensionality of multiomics data with a relatively small number of patients presents a challenge for the cancer prediction tasks. While feature selection techniques have been widely used to tackle the curse of dimensionality of multi-omics data, most existing methods have been applied to each type of omics data separately. In this paper, we propose a multi-agent architecture for feature selection, called MAgentOmics, to consider all omics data together. MAgentOmics extends the ant colony optimization algorithm to multi-omics data, which iteratively builds candidate solutions and evaluates them. Moreover, a new fitness function is introduced to assess the candidate feature subsets without using prediction target such as survival time of patients. Therefore, it can be considered as an unsupervised method. We evaluate the performance of MAgentOmics on the TCGA ovarian cancer multi-omics data from 176 patients using a 5-fold cross-validation. The results demonstrate that the integration power of MAgentOmics is relatively better than the state-of-the-art supervised multi-view method. The code is publicly available at https://github.com/SinaTabakhi/MAgentOmics

White Rose Research Online

Multimodal learning for multi-omics: a survey

Author: Ahadian P.
Lu H.
Suvon M.N.I.
Tabakhi S.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 16/12/2022
Field of study

With advanced imaging, sequencing, and profiling technologies, multiple omics data become increasingly available and hold promises for many healthcare applications such as cancer diagnosis and treatment. Multimodal learning for integrative multi-omics analysis can help researchers and practitioners gain deep insights into human diseases and improve clinical decisions. However, several challenges are hindering the development in this area, including the availability of easily accessible open-source tools. This survey aims to provide an up-to-date overview of the data challenges, fusion approaches, datasets, and software tools from several new perspectives. We identify and investigate various omics data challenges that can help us understand the field better. We categorize fusion approaches comprehensively to cover existing methods in this area. We collect existing open-source tools to facilitate their broader utilization and development. We explore a broad range of omics data modalities and a list of accessible datasets. Finally, we summarize future directions that can potentially address existing gaps and answer the pressing need to advance multimodal learning for multi-omics data analysis

White Rose Research Online